Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration

نویسندگان

  • Ottokar Tilk
  • Tanel Alumäe
چکیده

Automatic speech recognition systems generally produce unpunctuated text which is difficult to read for humans and degrades the performance of many downstream machine processing tasks. This paper introduces a bidirectional recurrent neural network model with attention mechanism for punctuation restoration in unsegmented text. The model can utilize long contexts in both directions and direct attention where necessary enabling it to outperform previous state-of-the-art on English (IWSLT2011) and Estonian datasets by a large margin.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LSTM for punctuation restoration in speech transcripts

The output of automatic speech recognition systems is generally an unpunctuated stream of words which is hard to process for both humans and machines. We present a two-stage recurrent neural network based model using long short-term memory units to restore punctuation in speech transcripts. In the first stage, textual features are learned on a large text corpus. The second stage combines textua...

متن کامل

Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

The stream of words produced by Automatic Speech Recognition (ASR) systems is devoid of any punctuations and formatting. Most natural language processing applications usually expect segmented and well-formatted texts as input, which is not available in ASR output. This paper proposes a novel technique of jointly modelling multiple correlated tasks such as punctuation and capitalization using bi...

متن کامل

Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks

The stream of words produced by Automatic Speech Recognition (ASR) systems is typically devoid of punctuations and formatting. Most natural language processing applications expect segmented and well-formatted texts as input, which is not available in ASR output. This paper proposes a novel technique of jointly modeling multiple correlated tasks such as punctuation and capitalization using bidir...

متن کامل

Attention-based Recurrent Neural Networks for Question Answering

Machine Comprehension (MC) of text is an important problem in Natural Language Processing (NLP) research, and the task of Question Answering (QA) is a major way of assessing MC outcomes. One QA dataset that has gained immense popularity recently is the Stanford Question Answering Dataset (SQuAD). Successful models for SQuAD have all involved the use of Recurrent Neural Network (RNN), and most o...

متن کامل

Multiple Range-Restricted Bidirectional Gated Recurrent Units with Attention for Relation Classification

Most of neural approaches to relation classification have focused on finding short patterns that represent the semantic relation using Convolutional Neural Networks (CNNs) and those approaches have generally achieved better performances than using Recurrent Neural Networks (RNNs). In a similar intuition to the CNN models, we propose a novel RNN-based model that strongly focuses on only importan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016